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ABSTRACT 

The accurate and precise removal of 21-cm foregrounds from Epoch of Reionization 
redshifted 21-cm emission data is essential if we are to gain insight into an unexplored 
cosmological era. We apply a non-parametric technique, Generalized Morphological 
Component Analysis or GMCA, to simulated LOFAR-EoR data and show that it has 
the ability to clean the foregrounds with high accuracy. We recover the 21-cm ID, 2D 
and 3D power spectra with high accuracy across an impressive range of frequencies and 
scales. We show that GMCA preserves the 21-cm phase information, especially when 
the smallest spatial scale data is discarded. While it has been shown that LOFAR-EoR 
image recovery is theoretically possible using image smoothing, we add that wavelet 
decomposition is an efficient way of recovering 21-cm signal maps to the same or greater 
order of accuracy with more flexibility. By comparing the GMCA output residual maps 
(equal to the noise, 21-cm signal and any foreground fitting errors) with the 21-cm 
maps at one frequency and discarding the smaller wavelet scale information, we find a 
correlation coefficient of 0.689, compared to 0.588 for the equivalent ly smoothed image. 
Considering only the central 50% of the maps, these coefficients improve to 0.905 and 
0.605 respectively and we conclude that wavelet decomposition is a significantly more 
powerful method to denoise reconstructed 21-cm maps than smoothing. 

Key words: cosmology: theory - dark ages, reionization, first stars - diffuse radiation 
- methods: statistical. 



1 INTRODUCTION 

When the first ionizing sources appeared 400 million years 
after the Big Bang, the Universe emerged from the 'Dark 
Ages' and began to be reionized. This Epoch of Reionization 
(EoR) is on the verge of being directly observed for the first 
time, with a new generation of radio telescopes beginning to 
see first light (e.g. Low Frequency Array (LOFAR^J Giant 
Metrewave Radio Telescope (GMRTfl Murchison Wide- 



field Array (MWA^Precision Array to Probe the Epoch of 
Reionization (PAPERQ 21 Centimeter Array (21CMA0. 

The vast majority of EoR observations will take advan- 
tage of the 21-cm spectral line - produced by a spin flip in 
neutral hydrogen (van de Hulst|1945||Ewen fe Purcell|1951 



Muller fe Oort|1951| ). This~21 -cm radiation can be observed 
interferometrically at radio wavelengths as a deviation from 
the brightness temperature of the CMB (|Field||1958| |Field| 



1959 



Madau, Meiksin fe Rees|| 1 99 7| [Shaver et al.||1999| ). 



Observationally, the 21-cm signal will be accompanied 
by systematic effects due to the ionosphere and instrument 
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response, system noise, extragalactic foregrounds and Galac- 



tic foregrounds (e.g. |Jelic et aL 2008 2010), the latter of 



which are orders of magnitude larger than the 21-cm signal 
we wish to detect. The foregrounds must be accurately and 
precisely removed from the observed data as any error at this 
stage has the ability to strongly affect the EoR 21-cm signal. 
Foreground removal and the implications for 21-cm cosmol- 
ogy have been extensively researched over the past decade 
(e.g |Di Matteo et al.||2002| |Oh fe Mack||2003| |Di Matteo, 



Ciardi &; Miniati|2004[ Zaldarriaga, Furlanetto &; Hernquist 
2004| |Morales fe Hewitt|2004| |Santos, Cooray & Knox|2005 f 



Wang et al.||2006| |McQuinn et al.||2006| |Jelic et al.||2008| 
Gleser, Nusser &; Benson|2008| |Bowman, Morales &; Hewitt| 
2006 ;[Harker et al.|20091 [Liu, Tegmark fe Zaldarriaga|2009[ 



Liu et al.||2009| |Harker et al.||2010| |Liu &; T egmark 2011, 
Petrovic fc Oh|2011||Mao||2011||Liu fe Tegmark|2012l [Cho*] 



Lazarian & Timbie 20 12|Chapman et al.||2012| . This paper 
concentrates on the spectral fitting of the foregrounds and 
assumes that bright sources have been accurately removed, 
for example via a flux cut ( Di Matteo et al~||2004|) . This is 
a safe assumption since, as recently shown by |Trott, Wayth] 
|fc Ti ngay 2012, the residuals from the bright source sub- 
traction are expected to be smaller than the thermal noise 
contribution and so are not a limiting factor. 

In our first paper, |Chapman et al.|(|2012D, we introduced 
a method to successfully remove the foregrounds while mak- 
ing only minimal assumptions. By introducing another simi- 
larly successful method, we acknowledge the possibility that 
different methods will be well suited for the extraction of dif- 
ferent information from the data and that there is an advan- 
tage in having several foreground cleaning methods to apply 
the data independently to confirm a statistical detection. In 
this paper we implement another non-parametric method, 
the sparsity-based blind source separation (BSS) technique 
Generalized Morphological Component Analysis, or GMCA. 
GMCA provides a complete basis set for foreground removal 
as opposed to a polynomial fitting method which, since an 
infinite basis set is required to describe the foregrounds ac- 
curately, leaves the foregrounds incompletely described. 

In Section [2] we summarise the various foreground re- 
moval pipelines which have been introduced in the literature 
so far and then go on to detail the method that we utilise, 
GMCA. In Section [3] we introduce our data cube and the 
methods used to produce it before presenting the statistical 
results in Section [4] In Section [5] we explore the possibility 
of recovering images of reionization before we set out our 
conclusions in Section [6] 



2 FOREGROUND REMOVAL TECHNIQUES 

The statistical detection of the 21-cm reionization signal de- 
pends on an accurate and robust method for removing the 
foregrounds from the total 21-cm signal. For a brief sum- 
mary of the recent 21-cm foreground removal literature we 



ask the reader to refer to Section 2 of Chapman et al. ( 2012 ). 

The majority of literature involves parametric methods, 
whereby at some point a certain form for the foregrounds is 
assumed, for example polynomials (e.g. |Santos et al.||2005| 



Wang et al. 


2006) 


McQuinn et al.|2006 


Bowman et al. 2006 ; 


Jelic et al. 


2008; 


Gleser et al. |2008| 


^iu, Tegmark & Zal-| 


darriaga|2009||Liu et al.|2009||Petrovic & Oh|2011|). In con- 



trast, non-parametric methods do not assume a specific form 
for the foregrounds, instead allowing the data to determine 
the foregrounds using more free parameters. While advan- 
tageous for poorly constrained data, results are often not 
as promising as parametric results, though recent methods 



have challenged this. Harker et al. ( 2009 2010 ) preferentially 



considered foreground models with as few inflection points 
as possible, which when applied to simulated LOFAR-EoR 
data compared very favourably with parametric methods. 
Similarly, fastica as presented in | Chapman et al. (2012) ac- 
curately recovered the 21-cm power spectra by considering 
the statistically independent components of the foregrounds. 
GMCA (|Bobin et al ||2007l |Bobin et al ||2008[ [Bobin, Starck,| 
Moudden fc Fadili||2008[ (Bobin et al.||2012| ) is another non- 
parametric method which has a greater flexibility through 
wavelet choice without the sacrifice of the blind nature of 
the approach. We will show that GMCA not only recovers the 
power spectra to high accuracy but also that, using wavelet 
decomposition, the simulated 21-cm signal maps can be re- 
covered exceedingly well after the foreground removal pro- 



2.1 The GMCA method 

The non-parametric method of removing the foregrounds is 



effectively a BSS problem. |Chapman et al. ( 2012| utilised 
statistical approach to source separation, namely fastica. 
This assumed that the components of the foregrounds were 
statistically independent and non- Gaussian in order to re- 
construct the smooth spectral form of the foregrounds and 
leave a residual signal from which we could identify the 21- 
cm emission statistics. This statistical pursuit of indepen- 
dence is only one form that BSS techniques take, the other 
utilising morphological diversity and sparsity to separate the 
sources. |Zibulevsky &; Pearlmutter] ( |2001| proposed a new 
method of BSS, where one could find a basis set in which 
the sources to be found would be sparsely represented, i.e. 
a basis set where only a few of the coefficients would be 
non-zero. With the sources being unlikely to have the same 
few non-zero coefficients one could then use this sparsity 
to more easily separate the mixture. The idea of exploit- 
ing the sparseness of sources in different bases has evolved 
into a full and diverse field of applications. The method has 
evolved to allow sources to have different morphologies, ex- 
ploit multichannel data and consider different bases for dif- 
ferent sources in order to achieve the most sparse represen- 
tations. 

Consider an observation of m maps each constituting t 
pixels across m channels of observation. The problem to be 
solved can be stated in the following manner: 



X = AS + N 



(1) 



where X is the mxt matrix representing the observed data, 
n is the number of sources to be estimated, S is the signal 
nxt matrix to be determined, A is the mxn mixing matrix 
and N is the mxt noise matrix. 

As this is a BSS problem, we need to estimate both 
S and A. We seek to find the 21-cm signal as a residual in 
the separation process, therefore S represents the foreground 
signal and, due to the extremely low signal-to-noise of this 
problem, the signal is numerically ignored by the method 
and can be thought of as an insignificant part of the noise. 
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We can expand the sources, Sj, in a wavelet basis $ 
which can mathematically be thought of as a matrix of 
wavelet waveforms such that \/j £ {1, ...,n} Sj = otj&. Sj 
is defined to be sparse if only a few of the o.j are significantly 
non-zero. 

The objective of GMCA is to seek an unmixing scheme, 
through the estimation of A, which yields the sparsest 
sources S in the wavelet domain. This is expressed by the 
following optimization problem written in the augmented 
Lagrangian form : 



3 = 1 



(2) 



where typically ||a|| p = (5^ fc |a[fc]| p )~"'; sparsity is gener- 
ally enforced for p = which measures the number of non- 
zero entries of a (or its relaxed convex version with p = 1) 

and ||X||f = (trace(X T X)) 1 2 is the Frobenius norm. The 
problem in Equation [2] is solved in an iterative two-step al- 
gorithm such that at each iteration q : 



(i) Estimation of the S for A fixed to A (g 



-i) . 



Solving the problem in Equation [2] for p = assum- 
ing A is fixed to A ((? - 1} , the sources are estimated as 
follows : 



;(<?) 



(a ( *- 1)+ x$ t ) 



where Aa stands for the hard-thresholding operator with 
threshold A; this operator puts to zero all coefficients with 
amplitudes lower than A. In practice, the threshold A is 
set to be equal to 3 times the standard deviation of the 
noise level to exclude noise coefficients. The term A^ -1 ^ + 
denotes the Moore pseudo- inverse of the matrix A^ 9-1 ^. 

(ii) Estimation of the A for S fixed to : 

Updating the mixing matrix assuming that the sources are 
known and fixed to is as follows : 

A (g) = XS 0?) + 

For more technical details about GMCA, we refer the inter- 
ested reader to (|Bobin et al.|2007[|Bobin et al.|2008[|Bobin, 



Starck, Moudden fe Fadili||2008[ |Bobin et al.||2012| ), where 
it is shown that sparsity, as used in GMCA, allows for a 
more precise estimation of the mixing matrix A and more 
robustness to noise than ICA-based techniques. 

GMCA provides an efficient method of separating the 
foreground signal from the noise and 21-cm signal by locat- 
ing the most sparse components that the foreground signal 
could be made of in the wavelet basis <!>. From a Bayesian 
point of view, using this sparsity method is equivalent to 
having an in-built prior in the model that the foregrounds 
are sparse over the basis chosen. While a Bayesian evidence 
analysis could be possible in order to compare different 
methods as well as different bases and quantize the over- 
fitting due to more free parameters, we consider it outside 
the scope of this project. 



2.2 Wavelets 

The set of basis functions, <E>, used by GMCA are called 
wavelet functions. 

The Fourier transform is a well known method of 
analysing data at different scales with a single set of basis 
functions - sines and cosines. In reality this confined basis 
set can obscure information, and so we instead consider an 
infinite set of basis functions localized in space - the wavelet 
functions. There are many types of wavelets - with some 
more localized in space, some smoother and some with frac- 
tal structures. 

The most common form of wavelet used in astrophysics 
is the isotropic undecimated wavelet transform (IUWT) 
which we describe briefly below in reference to the more 
complete descriptions in literature (e.g. |Starck, Murtagh"fc| 
Bijaoui| [l998l |Starck fe Murtagh||2006l ). Consider an image 
with p x p pixels (where, using the previous section's nota- 
tion p x p = t and we will refer to the pixel coordinates as 
[k,l]). 

We can decompose an image at a particular frequency, 
Co, into a coarse version of itself, cj, along with a superpo- 
sition of the original image at different wavelet scales: 



Co 



[k,l\=Cj[k,l\+^W j [k,l\ 



(3) 



3 = 1 



where the wavelet coefficient Wj represents the data at scale 
2~ j . 

The decomposition is typically achieved using low-pass 
ID filters, which we call hiD, implemented by the "a trous" 
algorithm: 

Cj+i[M] = ^2^2h 1D [p]h 1D [q]cj[k + 2 3 qJ + 2 3 p] (4) 

q p 

W 3 + l = C 3 l k , l ] ~ C 3 + l 1] (5) 

When Co can be described by only a few significantly non- 
zero Wj, we say that Co is sparse in that basis. 

In this paper we utilise wavelets twice. The first time 
is within the GMCA algorithm, where we have a choice of 
a multitude of different wavelet types. GMCA uses wavelet 
decomposition to identify the sources, S, but then returns 
a data cube with all scales present. In our analysis in Sec- 
tion |5] we wish to look at these results on different scales 
and so we utilise wavelet decomposition to do this. We will 
use the IUWT both within GMCA and later to analyse the 
images at different scales, though we briefly consider other 
wavelets when we question how much our results depend on 
this choice of wavelet. 

For our GMCA implementation we set the number of 
decomposition scales to be eight, the maximum allowed by 
the dimensions of our data cube. 



3 SIMULATED EOR DATA 

We simulate 170 frequency maps between 115 and 199.5 
MHz with spacings of 0.5 MHz. The maps consist of 512 2 
pixels representing a comoving 1734 Mpc 2 area. At z =10.8 
this is equivalent to a field of view of 10°xl0° or a reso- 
lution of 1.17 arcminutes per pixel. Since an interferometer 
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like LOFAR is insensitive to the mean value of the bright- 
ness temperature, we use mean-subtracted maps. We use the 



same set of simulations as iChapman et al. (2012) and the 



reader can refer there for more detailed information on the 
simulations. 

The observable of the 21-cm radiation, the 
brightness temperature Tb, is simulated using the 



semi-numeric modellin g tool 21CMFAST | Mesinger 
Furlanettol |20071 |Mesinger, Furlanetto k, Cen 



2011) . The "code was run using a standard cosmology, 



(Q A ,Q m ,Qb ,n, cr 8 , h)=(0. 72,0. 28,0.046, 0.96,0.82,73) pTomatfl u 
et al.||201l| ) and initialised at z=300 on a 1800 3 grid. The 
velocity fields used to perturb the initial conditions as well 
as the resulting 21-cm Tb boxes were formed on a cruder 
grid of 450 3 before being interpolated up to 512 3 . We define 
halos contributing ionizing photons as having a minimum 
virial mass of 1 x 1O 9 M0. 

The foreground simulations are obtained using the fore- 
ground models described in Jelic et al. (2008 2010). We 



model Galactic diffuse synchrotron emission (GDSE), Galac- 
tic localised synchrotron emission, Galactic diffuse free-free 
emission and extragalactic foregrounds consisting of con- 
tributions from radio galaxies and radio clusters. We do 
not consider the polarisation of the foregrounds. The fore- 
grounds simulated here can be up to five orders of magnitude 
larger than the signal we hope to detect but since interferom- 
eters such as LOFAR measure only fluctuations, foreground 
fluctuations dominate by 'only' three orders of magnitude. 

To simulate the noise at each frequency, a LOFAR mea- 
surement set was filled with Gaussian noise in the uv plane. 
This was then imaged to create a real-space image, the root 
mean square of which can be normalized to the value as given 



by the prescription detailed in Chapman et al. (2012). For 



example the noise sensitivity at 150 MHz for an integration 
time of 600 h, a PSF of width 4 arcminutes and a frequency 
spacing of 0.5 MHz is 64 mK. The 170 noise maps were un- 
corrected over frequency - i.e. a different noise realization 
was used to fill the measurement set for each frequency. 

The success of an interferometer such as LOFAR is 
highly dependent on how uv space is sampled. The particu- 
lar pattern of uv sampling forms a beam which affects how 
the components such as the foregrounds are seen by the in- 
terferometer. Dirty foreground and 21-cm images were simu- 
lated by convolving with the PSF of the LOFAR set-up used 
to simulate the noise. The PSF used for creating dirty im- 
ages (and for creating the noise as described in the previous 
section) was chosen to be the worst in the observation band- 
width - i.e. the PSF at 115 MHz. In observations the syn- 
thesized beam decreases in size with increasing frequency, 
causing point source signals to oscillate with the beam, pro- 
ducing a foreground signal with an oscillatory signal very 



much like that of the 21-cm signal (Vedantham, Shankar & 
Subrahmanyan|2012|) . However, this mode-mixing contribu- 



tion has been found not to threaten the 21-cm recovery and 
have a power well below the 21-cm level (|Bowman et al 



2006; |Liu, Tegmark fe~ZaT darriaga 2009j |Trott, WaythT 
Tingay 2 012| ). As such we do not consider a frequency de- 
pendent PSF here. 

Once the foregrounds and 21-cm signal have been ad- 
justed for uv sampling, the three component cubes are added 
together. The components of the total 5T\> along a random 
line of sight are shown in Fig. [I] 
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Figure 1. The redshift evolution of the simulated cosmo logical 
signal (red; dot), foregrounds (black;solid), noise (purple; dash) 
and total combined signal (blue; dash dot). All components have 
undergone the PSF convolution. Note the 21-cm signal has been 
amplified by 10 and displaced by -IK for clarity. 



4 RESULTS 

In the following section, the word 'reconstructed' refers to 
a component which has been estimated from the simulated 
data using GMCA. The 'residuals' or ('rest' for short) are 
the difference between the reconstructed and simulated fore- 
grounds and should therefore consist of the 21-cm signal, 
noise and any fitting errors. 



4.1 Source Number 

GMCA requires the specification of the number of sparse 
sources that the data can be defined by. We utilise this com- 
ponent separation technique to define the foregrounds in or- 
der to subtract them, treating the 21-cm signal as noise. 
Therefore, the number of sources refers to the number of 
foreground contributions which can be described by unique 
sparse descriptions (not necessarily the number of different 
foreground components such as Galactic free- free). 

As each foreground model might be best described by 
a different number of sources, we seek to define the number 
of sources by minimizing the leakage of foregrounds into the 
21-cm signal. Let us introduce the statistics Ly and Ry- 

L Y = A{A T A)~ 1 A T Y (6) 

R Y = Y - A(A T A)~ 1 A T Y (7) 

where A is the mixing matrix calculated by GMCA and Y is a 
data cube, for example the foregrounds or the 21-cm signal. 
Ly tells us the amount that the data Y contributes to the 
GMCA sources (in our case the reconstructed foregrounds). 
Thus we can calculate the amount of leakage of simulated 
noise and 21-cm signal, L noC s into the reconstructed fore- 
grounds by allowing Y to equal the combined simulated 21- 
cm and noise data cube. Conversely, Ry tells us the amount 
of data Y which does not contribute to the GMCA source 
model. For example, letting Y equal the simulated fore- 
ground cube, Rf g , will tell us how much of the foregrounds 
leak into the residuals. 

We take the power spectra of L noC s and Rf g and com- 
pare them to the power spectra of the 21-cm signal to see 
the number of sources for which leakage is minimized, Fig. 



2 




-8 f ] 

-2.0 -1.5 -1.0 -0.5 0.0 

log 10 (k per p/(h Mpc" 1 )) 

Figure 2. The 2D power spectrum of Rf g (blue dash) and L nocs 
(red dot) and the 21-cm power spectrum (black, solid) at a fre- 
quency of 160.0 MHz for GMCA with 2 and 3 sources (top and 
bottom respectively). 
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2 




Figure 3. The 2D power spectrum of Rf g (solid) and L noC s 
(dashed) at 160 MHz for: IUWT (black), Mallat's wavelet trans- 
form (red), Feauveau's wavelet transform without under-sampling 
(blue), Haar's wavelet transform (purple). 



others at certain frequencies. For our implementation we 
choose the IUWT as it consistently minimizes the leakages 
over the frequency range. We will show in Section [4. 3. 2| that 
we can potentially correct for L no , though not L cs and Rf g 
due to the blind nature of the problem. 



2] Note that, prior to our wavelet type analysis in Section 
4.2| we adopt the default setting of GMCA for this source 
number analysis, the IUWT using the a trous algorithm. 

To be confident of our reconstructed 21-cm signal, we 
ideally want both the power spectra of Rf g and L n0 cs to 
lie below that of the 21-cm signal and to be as small as 
possible. We see that there is no difference in the leakage 
power spectra whether the foregrounds are assumed to have 
two or three different components as defined by different 
sparse representations. Indeed, we find this hold true for four 
and five sources also. We choose to assume two foreground 
sources for the rest of this paper, though the reader should 
be aware that different foreground models may require dif- 
ferent source numbers in order to optimise the method. 



4.2 Choice of wavelet 

In Fig. [3] we consider how the leakages Rf g and L n0 cs change 
depending on the choice of wavelet used by GMCA. There are 
many different types of wavelets, but they can be broadly 
categorized by whether they are decimated (i.e. provide a 
redundant signal representation), isotropic and which filter 
they use for the separation of data at different scales. Here 
we consider the IUWT, a decimated wavelet transform (re- 
ferred to as Mallat), a non-dyadic and undecimated wavelet 
transform (referred to as Feauveau) and an undecimated 
wavelet transform using the Haar filter (referred to as Haar) . 

We can see that the choice of wavelet can affect how 
small the foreground leakages are, and therefore the suc- 
cess of the method. These differences are highly dependent 
on frequency as well, with one wavelet type out-performing 



4.3 Power Spectra 

EoR experiments aim to recover the power spectrum of the 
cosmological signal over a broad range of frequencies. 

The power spectrum of a line-of-sight/map/cube at a 
single frequency is calculated by 1D/2D/3D Fourier trans- 
forming that line-of-sight/map/cube and binning the pixels 
according to Fourier scale, k. The power at any particular 
k, {5(k)5*(k)) is the average power of all the uv cells in the 
bin centering on k. The error on the point for a particular 

bin, ki, is calculated as Gi — ^( fc ^j_L fc -ill w here rife, is the 

\/ Uk i 

number of uv cells that reside in that k bin. The power spec- 
trum of the reconstructed cosmological signal is calculated 
by subtraction of the noise power spectrum from the GMCA 
residuals power spectrum. The error on the simulated 21- 
cm power spectrum is added in quadrature with the error 
on the noise to reflect the error on the reconstructed 21-cm 
power spectrum. Note that we assume Gaussianity whereas 
the 21cm signal is not Gaussian and also we calculate the 
error bars from the power of a single realization rather than 
over an ensemble of simulations. We ask the reader to bear in 
mind that these error bars might be considered incomplete 
because of this. 

To explain a few graphical conventions: any points 
where the power of the residuals is below the power of the 
noise are omitted, as this leads to an unrealistic negative re- 
constructed 21-cm power; any error bars extending to below 
the x axis in linear space are shown with a lower error bar 
of equal length to the upper error bar in log space. 
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Figure 4. The ID power spectra of the simulated 21-cm (red, 
solid), the residuals (black, dot), the reconstructed 21-cm (purple, 
points) and the noise (blue, dash dot). Three 8 MHz frequency- 
wedges centred at 127 MHz, 151 MHz and 175 MHz respectively 
are shown from top to bottom. 
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Figure 5. Top : The leakage ratio for the 2D power spectra for fre- 
quencies of 130MHz (black,solid), 150MHz (red,dot) and 170MHz 
(blue, dash). Bottom : The leakage ratio for the 3D power spectra 
for frequencies of 135MHz (black, solid), 151MHz (red, dot) and 
167MHz (blue,dash). 



4-3.1 ID Power Spectra 

The ID power spectra are calculated over frequency wedges 
of 8MHz to avoid evolution effects. Each line of sight pro- 
duces a ID power spectrum, one for each of the 512 x 512 
pixels. These power spectra are then averaged over all the 
pixels. The frequencies mentioned correspond to the fre- 
quency in the middle of each 8MHz wedge and the quantity 
plotted in Figgis Aj D (k) = Lfc ^ (fc f (fc)) where L is the 
comoving length of the wedge. The ID power spectra are 
recovered to high accuracy across the frequency range and 
across the scales. 



4.3.2 2D and 3D Power Spectra 

For the 2D power spectra, the quantity plotted is A| D (/c) = 
Ak ( s (k)5 (fc)) w i iere a is the area of the simulation map. 
To calculate the 3D power spectra we divide the cube into 
sub-bands of 8 MHz to avoid signal evolution effects. The 
quantity plotted is A§ D (fc) = Vk ^ fc ^ where V is the 

volume of the sub-band. 

We now consider the recovered 2D and 3D 21-cm power 
spectra and how best we can minimise the effects of leakage. 
The noise leakage into the reconstructed foregrounds can be 
accurately quantified by creating an independent realization 
of the noise (no2) (for real data it is assumed we will know 
the statistics of the noise to high precision and can there- 
fore create a second realization from the known noise power 
spectrum) and applying Equation [6] to find L n0 2- We find 
that the power spectra of L no and L n0 2 are almost identical 
meaning that A is not a strong function of the original noise 



realization. We can use this information to find a better es- 
timate of the 2D and 3D power spectra and compensate for 
the noise leakage using the following calculation: 



■ aL + aL 



(8) 



To calculate a leakage ratio we would generally have 
to include all the leakages, such that Leakage Ratio — 

A 2 

However, since we can correct for L no 



we can instead calculate: 



Leakage Ratio : 



R fg 



(9) 



which quantifies the amount of 21-cm leakage into the 
foregrounds and vice versa. We can see the leakage ratio 
plotted for the 2D and 3D power spectra for multiple fre- 
quencies in Fig. [5] We see that we can expect an accurate 
21-cm power spectrum recovery at large k once the noise 
leakage is compensated for. The smaller k recovery is still 
very dependent on the foreground magnitude and therefore 
frequency of observation. 

We can see the result of applying Equation [8] in Figs. 
[6] and [7] Wherever Rf g and/or L noC s exceeds the power of 
the simulated 21-cm signal we see a degraded fit in Fig. [6] 
The 2D power spectra are recovered to excellent accuracy 
and we see that once the noise leakage is taken into account 
there is much less leakage at large k scales, allowing a more 
complete power spectrum reconstruction. 

For the 3D power spectra, a similarly accurate recovery 
is made across the frequency range. The recovery in 3D is 
more precise due to the larger amount of data in a box as 



The Scale of the Problem 7 




-2.0 -1.5 -1.0 -0.5 0.0 -2.0 -1.5 -1.0 -0.5 0.0 

log 10 (k p er P / (h Mpc" 1 )) log 10 (k perp / (h Mpc 1 )) 



Figure 6. 2D power spectrum of the simulated 21-cm signal, reconstructed 21-cm signal, residuals and noise at 130 MHz, or z=9.92, 
150 MHz, or z=8A7 and 170 MHz, or z=7.35 from top to bottom. The left column is the fiducial data whereas the right hand column 
plots the reconstructed 21-cm power spectrum but with the second noise realization added, as described in Section |4. 3. 2| Linestyles are 
as described in Fig. pUwith the additional dark green dashed line representing the total leakage power (A^ + A^ ). 



opposed to a single slice. Again, once the noise leakage is 
taken into account the recovered spectra become much more 
complete on the larger k scales. 



5 PHASE CONSERVATION AND IMAGING 

Recently it has been shown that the imaging of the neu- 
tral hydrogen in the late stages of reionization is possible 
with the current generation of radio telescopes when angu- 
lar scales larger than 0.5° are considered, independent of the 
type of reionization source dZaroubi et al.||2012|). Here, we 
compare the output residual maps with the simulated 21-cm 
maps and consider how well the phases of the 21-cm signal 
are conserved through the foreground removal process. The 
better the phases are conserved, the more correlation be- 
tween maps we will observe. We will also consider the maps 
at different scales and as such we also decompose the output 
maps into 8 wavelet scales using the IUWT. 

For a particular frequency, we calculate the phase of 
each uv point in a Fourier transformed map, F, as Phase [u,v] 
= tan- 1 (Im(F(u,v))/Re(F(u,v))). 

In Fig. [8] we see the phase density relating to each 
wavelet scale of the 21-cm and residual cubes. For each fre- 
quency we calculate the phase of each pixel in the 21-cm 
and residual maps and use this as a coordinate on a phase 
density map where the phase of the residuals is the x axis 
and the y axis is the phase of the 21-cm map. The more pix- 
els with coordinates corresponding to a particular bin in the 
phases, the higher the phase density we will observe. If GMCA 
perfectly preserves the phase of the 21-cm signal we should 



see a diagonal phase density plot. For clarity, only the phase 
bins with a pixel count in the largest 67% of the pixel count 
distribution are plotted. We see that for the three crudest 
wavelet scales there is excellent phase recovery accross the 
frequency range, while at 160 MHz this excellent recovery 
can be seen on even smaller wavelet scales. 

In Fig. [9] we start with the crudest wavelet scale and 
then add the next crudest scale one at a time to see the 
effect on phase conservation, finding that this improves the 
recovery considerably. 

In Fig. [lO] we compare the reconstructed signal maps 
with the simulated signal maps at the different wavelet 
scales. It is clear that the foregrounds are reconstructed to a 
high accuracy at all scales. The residuals show clear correla- 
tion with the 21-cm maps - especially at scales 54-434 Mpc. 
By considering the data on different scales we compensate 
for the small scale noise leakage and can retrieve convinc- 
ing reconstructed images in comparison to the map with all 
scale data included. 

We plot the Pearson correlation coefficient between the 
simulated 21-cm maps and the residual maps at different 
individual scales in the top panel of Fig. [IT] On an individual 
basis, we can clearly see that the finer the scale of structure 
we correlate, the less of a correlation there is between the 
residuals and simulated 21-cm maps. The finer wavelet scale 
we look at, the more dominant noise leakage will be in the 
residual maps and so a smaller correlation is observed. By 
only comparing the crudest wavelet scales however, we risk 
losing a lot of the small scale 21-cm structure and being 
increasingly dominated by the foreground signal. 
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Figure 7. 3D power spectrum of the simulated 21-cm signal, reconstructed 21-cm signal, residuals and noise at 135 MHz, or z=9.51, 
151 MHz, or z=8.40 and 167 MHz, or z=7.50 over an 8 MHz sub band (top to bottom). The left column is the fiducial data whereas the 
right hand column plots the reconstructed 21-cm power spectrum but with the second noise realization leakage added. Linestyles are as 
described in Fig. [6] 
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Figure 11. The Pearson correlation coefficient between 21-cm 
and residual maps. There is very little correlation when all scales 
are present or on the smallest scales, however we reach correlation 
coefficients of over 0.6 for distance scales 54 - 434 Mpc. The corre- 
lations are always much weaker at the lower end of the frequency 
range because the noise and foregrounds are at their highest and 
at the higher end of the frequency range because the 21-cm signal 
is negligible. 



We can add several of the wavelet scales together in or- 
der to balance having enough useful information without in- 
cluding too much noise leakage on the smaller wavelet scales. 



ages of the full data which have been smoothed with a 20 
arcminute Gaussian kernel. Wavelet decomposition has the 
advantage of providing a selection of scales on which you can 
analyse the images, as opposed to being restricted by niters 
such as a Gaussian kernel which simply remove all modes 
below a certain scale. However, the scales at which you can 
analyse images with wavelet decomposition are determined 
by the method itself - you cannot then ask what the data 
looks like at a scale halfway in-between two wavelet scales. 



In Fig. [12] we recover impressive images of the reioniza- 
tion signal when the smallest scale information is discarded. 
Comparing the residual and 21-cm maps on each row we find 
correlation coefficients of 0.689, 0.687 and 0.588 for the top, 
middle and bottom rows respectively. We therefore conclude 
that the wavelet decomposition more optimally removes the 
noise from the residuals than the smoothing technique (bot- 

In Fig. ~~ 



10 



torn row) employed by Zaroubi et al. (2012). 
can see the wavelet decomposition has the effect of cluster- 
ing the noise around the edges of the image (this effect is 
particularly pronounced in rows 3 and 4) . To avoid this edge 
effect we also correlate the maps in Fig. ^] again but this 
time only considering the central 50% of the image. We find 
correlation coefficients of 0.905, 0.788 and 0.605 for the top, 
middle and bottom rows respectively. With real data, we can 
assume that this edge effect would not be a problem as the 
fitting and decomposition can always be carried out over a 
slightly larger cube with only the central region being used 
for data analysis. 
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Figure 8. Density maps of the phase of maps of the simulated 21-cm and the residuals. From left to right are maps at frequencies 
130, 145, 160 and 175 MHz. From top to bottom are maps of the complete cubes and then of increasingly small scale wavelet scales. 
A clear diagonal signifies excellent phase recovery and therefore clearer images can be recovered. We see that on scales above 108 Mpc, 
the phases are well preserved however on smaller scales, the phases are highly uncorrelated. It is clear that considering different wavelet 
scales can result in much better phase recovery than considering the full cube. 
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Figure 9. Density maps of the phase of maps of the simulated 21-cm and the residuals. From left to right are maps at frequencies 130, 
145, 160 and 175 MHz. From top to bottom are maps of cubes with only the crudest scale present and then of only the 2,3,4,5,6 and 7 
crudest wavelet scales. The minimum distance scale information included is labelled for each wavelet scale, the maximum is always 1734 
Mpc. A clear diagonal signifies excellent phase recovery and therefore clearer images can be recovered. The addition of several scales 
together results in clearer diagonals than considering scales individually in Fig. [8] 
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Figure 10. The decomposition of the 21-cm signal, residuals, simulated noise + 21-cm signal, noise, foregrounds, reconstructed fore- 
grounds and noise (left to right). From top to bottom, the rows are the original image at 165MHz, and then the wavelet decomposition 
of this image at the 8 wavelet scales. We can see that the simulated and reconstructed foregrounds have a high correlation at all scales 
and even in the full cube. Similarly, the noise + 21-cm and residuals also share this strong correlation. As we cannot remove the noise 
directly we must look for a correlation between the residuals and simulated 21-cm, which will come as a result of little or no correlation 
between the noise and residuals at certain scales. The noise dominates too much in the full cube and on the large k scales, however we 
can clearly see a correlation by eye on distance scales between 108 and 434 Mpc. At the largest scale, the 21-cm signal is so small that 
the residuals are dominated by noise. 
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Figure 12. In the left column we show the 21-cm signal and 
in the right column the residuals of GMCA. In the top row only 
distance scales between 1734 and 108 Mpc are included, in the 
middle only distance scales between 1734 and 54 Mpc are in- 
cluded and on the bottom the images with all scales present have 
been smoothed with a 57 Mpc 20 arcminutes) Gaussian kernel. 
Clear correlations can be seen between the columns (coefficients 
of 0.689, 0.687 and 0.588 for the top, middle and bottom rows 
respectively). Taking only the central 50% of the image we find 
correlation coefficients of 0.905, 0.788 and 0.605 for the top, mid- 
dle and bottom rows respectively. 



6 CONCLUSIONS 

In this paper we have assessed the sparsity-based blind 
source separation technique GMCA as a possible way of re- 
moving the foregrounds on a 21-cm EoR signal. We recover 
the ID, 2D and 3D power spectra to high accuracy across 
the frequency range. Since the mixing matrix calculated by 
GMCA was shown not to be a strong function of the noise 
realization, we were able to compensate for leakage of noise 
power into the reconstructed foregrounds using an indepen- 
dent noise realization, leading to more complete 2D and 3D 
power spectra. 

We also considered if images of reionization could be re- 
covered from the LOFAR-EoR data once foreground removal 
with GMCA has been carried out. Using wavelet decomposi- 
tion, we considered the phase correlation between the GMCA 
residuals and the simulated 21-cm at different scales. We 
find strong correlations at the cruder wavelet scales and add 
several scales together to balance the amount of information 
in an image with the accuracy of the phase recovery. We find 
that when distance scales of below 54 Mpc are discounted, 
the GMCA residuals images are highly correlated with the 
21-cm images, with correlation coefficients of just less than 



0.7. In comparison, smoothing the images did not produce 
as strong a correlation. 

GMCA is a highly adaptable method and there remains 
the possibility that with careful tuning, the 21-cm signal 
could be picked out as a separate source as opposed to being 
present as a residual of the process. We intend to explore this 
further and consider using different mixing matrices for each 
scale. 
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